{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Backtest (high-level API)\n",
    "\n",
    "Tutorial for [NautilusTrader](https://nautilustrader.io/docs/) a high-performance algorithmic trading platform and event driven backtester.\n",
    "\n",
    "[View source on GitHub](https://github.com/nautechsystems/nautilus_trader/blob/develop/docs/getting_started/backtest_high_level.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1",
   "metadata": {},
   "source": [
    "## Overview\n",
    "\n",
    "This tutorial walks through how to use a `BacktestNode` to backtest a simple EMA cross strategy\n",
    "on a simulated FX ECN venue using historical quote tick data.\n",
    "\n",
    "The following points will be covered:\n",
    "- Load raw data (external to Nautilus) into the data catalog.\n",
    "- Set up configuration objects for a `BacktestNode`.\n",
    "- Run backtests with a `BacktestNode`.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "- Python 3.11+ installed.\n",
    "- [JupyterLab](https://jupyter.org/) or similar installed (`pip install -U jupyterlab`).\n",
    "- [NautilusTrader](https://pypi.org/project/nautilus_trader/) latest release installed (`pip install -U nautilus_trader`).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3",
   "metadata": {},
   "source": [
    "## Imports\n",
    "\n",
    "We'll start with all of our imports for the remainder of this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4",
   "metadata": {},
   "outputs": [],
   "source": [
    "import shutil\n",
    "from decimal import Decimal\n",
    "from pathlib import Path\n",
    "\n",
    "import pandas as pd\n",
    "\n",
    "from nautilus_trader.backtest.node import BacktestDataConfig\n",
    "from nautilus_trader.backtest.node import BacktestEngineConfig\n",
    "from nautilus_trader.backtest.node import BacktestNode\n",
    "from nautilus_trader.backtest.node import BacktestRunConfig\n",
    "from nautilus_trader.backtest.node import BacktestVenueConfig\n",
    "from nautilus_trader.config import ImportableStrategyConfig\n",
    "from nautilus_trader.core.datetime import dt_to_unix_nanos\n",
    "from nautilus_trader.model import QuoteTick\n",
    "from nautilus_trader.persistence.catalog import ParquetDataCatalog\n",
    "from nautilus_trader.persistence.wranglers import QuoteTickDataWrangler\n",
    "from nautilus_trader.test_kit.providers import CSVTickDataLoader\n",
    "from nautilus_trader.test_kit.providers import TestInstrumentProvider"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    "As a one-off before we start the notebook, we need to download some sample data for backtesting.\n",
    "\n",
    "For this example we will use FX data from `histdata.com`. Simply go to https://www.histdata.com/download-free-forex-historical-data/?/ascii/tick-data-quotes/ and select an FX pair, then select one or more months of data to download.\n",
    "\n",
    "Examples of downloaded files:\n",
    "\n",
    "- `DAT_ASCII_EURUSD_T_202410.csv` (EUR\\USD data for month 2024-10)\n",
    "- `DAT_ASCII_EURUSD_T_202411.csv` (EUR\\USD data for month 2024-11)\n",
    "\n",
    "Once you have downloaded the data:\n",
    "\n",
    "1. Copy files like the ones above into one folder—for example `~/Downloads/Data/` (by default, it will use the user's `Downloads/Data/` directory).\n",
    "2. Set the `DATA_DIR` variable below to the directory containing the data.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "DATA_DIR = \"~/Downloads/Data/\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {},
   "outputs": [],
   "source": [
    "path = Path(DATA_DIR).expanduser()\n",
    "raw_files = list(path.iterdir())\n",
    "assert raw_files, f\"Unable to find any histdata files in directory {path}\"\n",
    "raw_files"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8",
   "metadata": {},
   "source": [
    "## Loading data into the Parquet data catalog\n",
    "\n",
    "Histdata stores the FX data in CSV/text format with fields `timestamp, bid_price, ask_price`.\n",
    "First, load this raw data into a `pandas.DataFrame` with a schema compatible with Nautilus quotes.\n",
    "\n",
    "Then create Nautilus `QuoteTick` objects by processing the DataFrame with a `QuoteTickDataWrangler`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Here we just take the first data file found and load into a pandas DataFrame\n",
    "df = CSVTickDataLoader.load(\n",
    "    file_path=raw_files[0],                                   # Input 1st CSV file\n",
    "    index_col=0,                                              # Use 1st column in data as index for dataframe\n",
    "    header=None,                                              # There are no column names in CSV files\n",
    "    names=[\"timestamp\", \"bid_price\", \"ask_price\", \"volume\"],  # Specify names to individual columns\n",
    "    usecols=[\"timestamp\", \"bid_price\", \"ask_price\"],          # Read only these columns from CSV file into dataframe\n",
    "    parse_dates=[\"timestamp\"],                                # Specify columns containing date/time\n",
    "    date_format=\"%Y%m%d %H%M%S%f\",                            # Format for parsing datetime\n",
    ")\n",
    "\n",
    "# Let's make sure data are sorted by timestamp\n",
    "df = df.sort_index()\n",
    "\n",
    "# Preview of loaded dataframe\n",
    "df.head(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Process quotes using a wrangler\n",
    "EURUSD = TestInstrumentProvider.default_fx_ccy(\"EUR/USD\")\n",
    "wrangler = QuoteTickDataWrangler(EURUSD)\n",
    "\n",
    "ticks = wrangler.process(df)\n",
    "\n",
    "# Preview: see first 2 ticks\n",
    "ticks[0:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11",
   "metadata": {},
   "source": [
    "See the [Loading data](../concepts/data) guide for further details.\n",
    "\n",
    "Next, instantiate a `ParquetDataCatalog` (pass in a directory to store the data; by default we use the current directory).\n",
    "Write the instrument and tick data to the catalog. Loading the data should only take a couple of minutes, depending on how many months you include.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12",
   "metadata": {},
   "outputs": [],
   "source": [
    "CATALOG_PATH = Path.cwd() / \"catalog\"\n",
    "\n",
    "# Clear if it already exists, then create fresh\n",
    "if CATALOG_PATH.exists():\n",
    "    shutil.rmtree(CATALOG_PATH)\n",
    "CATALOG_PATH.mkdir(parents=True)\n",
    "\n",
    "# Create a catalog instance\n",
    "catalog = ParquetDataCatalog(CATALOG_PATH)\n",
    "\n",
    "# Write instrument to the catalog\n",
    "catalog.write_data([EURUSD])\n",
    "\n",
    "# Write ticks to catalog\n",
    "catalog.write_data(ticks)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13",
   "metadata": {},
   "source": [
    "## Using the Data Catalog \n",
    "\n",
    "After you load data into the catalog, use the `catalog` instance to load data for backtests or research.\n",
    "It contains various methods to pull data from the catalog, such as `.instruments(...)` and `quote_ticks(...)` (shown below).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get list of all instruments in catalog\n",
    "catalog.instruments()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15",
   "metadata": {},
   "outputs": [],
   "source": [
    "# See 1st instrument from catalog\n",
    "instrument = catalog.instruments()[0]\n",
    "instrument"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Query quote-ticks from catalog\n",
    "start = dt_to_unix_nanos(pd.Timestamp(\"2024-10-01\", tz=\"UTC\"))\n",
    "end =  dt_to_unix_nanos(pd.Timestamp(\"2024-10-15\", tz=\"UTC\"))\n",
    "selected_quote_ticks = catalog.quote_ticks(instrument_ids=[EURUSD.id.value], start=start, end=end)\n",
    "\n",
    "# Preview first\n",
    "selected_quote_ticks[:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17",
   "metadata": {},
   "source": [
    "## Add venues"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18",
   "metadata": {},
   "outputs": [],
   "source": [
    "venue_configs = [\n",
    "    BacktestVenueConfig(\n",
    "        name=\"SIM\",\n",
    "        oms_type=\"HEDGING\",\n",
    "        account_type=\"MARGIN\",\n",
    "        base_currency=\"USD\",\n",
    "        starting_balances=[\"1_000_000 USD\"],\n",
    "    ),\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19",
   "metadata": {},
   "source": [
    "## Add data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20",
   "metadata": {},
   "outputs": [],
   "source": [
    "str(CATALOG_PATH)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21",
   "metadata": {},
   "outputs": [],
   "source": [
    "data_configs = [\n",
    "    BacktestDataConfig(\n",
    "        catalog_path=str(CATALOG_PATH),\n",
    "        data_cls=QuoteTick,\n",
    "        instrument_id=instrument.id,\n",
    "        start_time=start,\n",
    "        end_time=end,\n",
    "    ),\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22",
   "metadata": {},
   "source": [
    "## Add strategies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23",
   "metadata": {},
   "outputs": [],
   "source": [
    "strategies = [\n",
    "    ImportableStrategyConfig(\n",
    "        strategy_path=\"nautilus_trader.examples.strategies.ema_cross:EMACross\",\n",
    "        config_path=\"nautilus_trader.examples.strategies.ema_cross:EMACrossConfig\",\n",
    "        config={\n",
    "            \"instrument_id\": instrument.id,\n",
    "            \"bar_type\": \"EUR/USD.SIM-15-MINUTE-BID-INTERNAL\",\n",
    "            \"fast_ema_period\": 10,\n",
    "            \"slow_ema_period\": 20,\n",
    "            \"trade_size\": Decimal(1_000_000),\n",
    "        },\n",
    "    ),\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24",
   "metadata": {},
   "source": [
    "## Configure backtest\n",
    "\n",
    "Nautilus uses a `BacktestRunConfig` object to centralize backtest configuration.\n",
    "The `BacktestRunConfig` is Partialable, so you can configure it in stages.\n",
    "This design reduces boilerplate when you create multiple backtest runs (for example when performing a parameter grid search).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "25",
   "metadata": {},
   "outputs": [],
   "source": [
    "config = BacktestRunConfig(\n",
    "    engine=BacktestEngineConfig(strategies=strategies),\n",
    "    data=data_configs,\n",
    "    venues=venue_configs,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26",
   "metadata": {},
   "source": [
    "## Run backtest\n",
    "\n",
    "Now we can run the backtest node, which will simulate trading across the entire data stream."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "27",
   "metadata": {},
   "outputs": [],
   "source": [
    "node = BacktestNode(configs=[config])\n",
    "\n",
    "results = node.run()\n",
    "results"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
